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ABSTRACT 

Quality of air prediction was the difficult function towards dynamic nature, instability, as well as 
better inconsistency within the time and space. Especially in urban areas, Air Pollution is becoming 
more and more, and the various pollutants affect the air quality. Hence, it is essential to accurately 
predict Air Pollution for providing hazardous impact earlier. The existing machine learning methods 
have been developed but it is difficult to forecast accurate pollutant and particulate levels and to 
predict the air quality index. Bilateral Transformative Broken-Stick Regression-based Quadratic 
Weighted Emphasis Boost Classification (BTBSR-QWEBC) technique is introduced for IoT-based 
Air Pollution Forecast with higher accuracy and minimum time consumption for increasing accuracy 
of air pollution forecasting. From the BTBSR-QWEBC, IoT devices are used to collect Air Quality 
data. The BTBSR-QWEBC technique includes three major processes namely pre-processing, Feature 
Selection, and classification. That Technique helps to improve the accuracy of the Air Pollution 
Forecast and to minimize time consumption. Experimental assessment is performed by various 
metrics namely Air Pollution Forecast accuracy, error rate, as well as Air Pollution Forecasting time 
and space complexity. The observed results display the BTBSR-QWEBC technique provides better 
accuracy as well as minimal time than conventional techniques. 

Keywords— Air Pollution Forecast, feature selection, classification Technique, AQI, 
Regression 


1. Introduction 

Air is one of the most important factors for the entire living creatures on the earth. Due to rapid 
industrialization, air pollution has become a significant problem in all developed and developing 
countries. Air pollution forecasting is an important step for air quality pollution management to 
decrease pollution’s negative impact on the environment and people’s health conditions. The entire 
existing forecasting model generally performs Air pollution forecasting and fails to execute the 
forecasting modelling effectively. 


2. Experimental Methodology 

Air pollution monitoring is a significant and challenging problem since it manages the surroundings 
and strengthens air pollution. In the real dataset, the number of air pollutants and the instances within 
training set increases danger. Therefore, feature selection was necessary for reducing dimensionality 
of dataset. Moreover, the raw data collected with the help of IoT devices comprises noisy data 
resulting increase time and space complexity. These types of problems were developed by novel 
technique called BTBSR-QWEBC based on three different processes. These three processes of the 
BTBSR-QWEBC technique are described in this section. 

Figure | given above illustrates the architecture of the BTBSR-QWEBC technique includes three 
different method such as pre-processing, feature selection, as well as classification. IoT devices are 
used to collect the air Quality data. First, pre-processing is performed using bilateral discretized Z- 
transform to obtain noise-free dataset. Second, Otsuka inducive Broken-stick regression was utilized 
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for choosing the significant function for minimizing time complexity of air pollution forecasting. 
Finally, the Quadratic Weighted Emphasis Boost technique is applied for classifying the data by 
analysing the selected features. Based on the classification, accurate air pollution forecasting is 
performed by measuring the air quality index. An elaborate explanation of the proposed BTBSR- 
QWEBC technique is presented in the following subsection. 


Collect number of data il 
gem Ss ü 
wavelet transform 


Otsuka indexive 7 
Feature selection 


Broken-stick regression 


E 


omar ee 
Emphasis Boost technique 


Accurate air pollution 
forecasting 


Figure l architecture of proposed BIBSR-QWEBC technique 


3. Otsuka inducive Broken-stick regression 

The second process of the BTBSR-QWEBC technique is to perform the feature selection using 
Otsuka inducive Broken-stick regression. Feature selection is the significant step while building 
machine learning. The major process was used for finding finest feasible features for building a 
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Figure 3 Flow diagram of Otsuka indexive Broken-stick regression 
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machine learning model. Relevant feature selection was used for minimizing air pollution forecasting 
time. 


Figure 3 illustrates the flow process of Otsuka inducive Broken-stick regression-based feature 
selection to obtain the relevant features for minimizing the time complexity. Let us consider the 
number of features ‘61,2, 63, ... . Bn,’, Then, the Broken-stick regression was the statistical 
procedures to evaluate interaction among dependent variable (features) using Otsuka similarity index. 
Broken-stick regression is used to segment the input into two parts based on breakpoint. It is 
significant for decision-making based on Otsuka similarity index. Otsuka similarity index was 
utilized for calculating similarity among features. 


Where, ‘p’ indicates similarity coefficient is measured between features ‘pi, and ‘fi’ 
The similarity coefficient values range between -1 and +1. 


3.1 Quadratic weighted emphasis boost technique 

Finally, the proposed BTBSR-QWEBC technique performs the Classification to forecast air pollution 
using the Quadratic weighted emphasis boost technique with the objective of improving both 
accuracy and time involved in air pollution monitoring. The quadratic weighted emphasis boost 
technique was the machine learning ensemble classification. Weak learner was the classifier which 
difficult for offering true classification. In contrast, strong learner is offering a true classification. 
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Figure 4 schematic construction of Quadratic weighted emphasis boost technique 
Figure 4 depicts the schematic construction of the weighted emphasis boost ensemble technique for 
accurate forecasting by lesser time. Weighted emphasis boost ensemble technique considers the input 


as a training sample set {Di, Z} where Di = D1, D2, ... , Dm’ denotes the sample air data and Z 
specifies the ensemble classification outcomes. As shown in figure 3, the boost technique initially 
constructs ‘B’ set of weak learners C1, C2,3, ... . CB and the outcomes are summed to build strong 


classification results. The weighted emphasis boost ensemble technique uses the weak learner as a 
Kernelized support vector classifier for forecasting the air pollution with the selected feature set. 
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Let us consider the training sample set {Dj, Z} that provide results whose value was established. Main 
aim of Kernelized support vector classifier was used for producing best line or decision boundary 
separates the n- dimensional within different classes. Extent of hyperplane based on number of output 
classes. Data points were the closest towards hyperplane as well as influence hyperplane were called 
as Support Vector. 


A separating hyperplane with the input that satisfies equation, 


H — a.Dj+d=0 

H indicates hyperplane or decision boundary, indicates training samples (i.e. input data i.e. air 
pollutants), ‘d? denotes bias as well as @ indicates normal weight vector to hyperplane (H). If the 
training samples were linearly divisible, different parallel support vectors were selected that divide 
the input into different classes. Hence, input data belongs to decision boundary by, 


Kı ~a.Di+d>0 

Kı — aæa.Di+d <0 

Where, Kı, K2 denotes support vectors that are positioned above as well as below boundary. Predicted 
output (y) of support vector using kernel function by, 

Z=} ayid(t,AQIr) 

Where Z represents predicted classification results, 8 (t , AQIr ) indicates kernel function calculates 
relationship among Testing Air Quality Index value (i.e. AQIt ) and Training Air Quality Index (i.e. 
AQIr ), a denotes the weights of the training samples. 


The air quality index for each sampled data is calculated based on the average of air pollutant 
concentrationsselected from the feature selection process (i.e. PM2.5, PM10, SO2, NOx, NO2) and the 
maximum value of CO (Carbon monoxide) and O3 (Ozone) respectively. 
AQI = (PM2.5, PM10, S02, NOx, NO2) + Max (CO, 03) 
From the above equation (11), the air quality index value ‘AQI’ is measured based on the average 
values of PM2.5, PM10, SO2, NOx, NH3, and the maximum values of CO and O3 respectively. Here, 
the Laplace RBF kernel is applied to measure the relationship between the Testing Air Quality Index 
value and Training Air Quality Index. The Laplace RBF kernel ‘9 (AQIt, AQIr)’ is expressed as 
follows, 
V (AQI:, AQIr) = exp (-IAQI:-AQI, I / v2) 
Where ‘v’ indicates a deviation. The Training Air Quality Index which is more like the Testing Air 
Quality Index value is classified as a particular class. In other words, the computed Training Air 
Quality Index is closer to the Testing Air Quality Index value being classified as a particular class. 
Here six different classes of air quality prediction are considered as good, satisfactory, moderate, 
poor, very poor, and severe. Laplace RBF kernel function provides the similarity ranges from 0 to 1. 
If the similarity is high (i.e. 1), then the accurate classification were attained. Thus, different classes 
of air quality estimation are obtained. 
The observed weak learner results have some training errors during the classification. Therefore, the 
weak learner results are summed to make a strong classification result. 
Z=} Ci B i=1 

Where, Z indicates ensemble output, Ci indicates weak learner. Weight gets initialized for making 

strong classification outcomes. 
Z=> CiB i=1 * yi 

Where, ‘yi ° indicates weak learner results. Weight was a random integer. Ensemble technique uses 
weighted emphasis function to measure the quadratic error of weak classification results, 


Eq =exp[p((> Ci B i=1 yi - Z)2 - (1 — p)(¥. Ci b i=1 ) 2 )] 
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Eq indicates weighted emphasis function, p indicates a weighting parameter, Z represents actual 
ensemble classification results, )’ Ciyi B i=1 indicates a predicted classification result of weak learner 
with the weight yi and the without weight } Ci B i=1 . 
V denotes a weighting parameter value is set to 1 and obtain the final quadratic error, 

Eq = exp[ È Ciyi B i=1 — Z)2] 
Finally, the weak learner weight is efficient on above-calculated error rate. If weak learner is properly 
categorized, weight is decreases. If not, initial weight value is improved. Weak learner by lesser error 
was selected by the strong classification outcome using better accuracy. 


4. Results and Discussion 
The performance of BTBSR-QWEBC and two existing IMD-VAE [1] CLS [2] are discussed with 
respect to different parameters namely air pollution forecasting accuracy, error rate, air pollution 
forecasting time. These metrics are described as given below. 

4.1 Impact of air pollution forecasting accuracy 
Air pollution forecasting accuracy was calculated by proportion of air quality sample data were 
accurately forecasted to entire number of air quality sample data considered for experiential 
evaluation. The formulation for air pollution forecasting accuracy is given below. 
APFacc =}, DFAcc Di * 100n i=1 
Where, APFacc indicates an air pollution forecasting accuracy, DFAcc denotes an air quality data 
forecasted accurately, Di’ denotes the total number of air quality sample data. It was calculated by 
percentage (%). 

Table 3: Air Pollution Forecasting Accuracy 


Air quality Air pollution forecasting accuracy (%) 

sample data 

(sumbers) ina | CLS 

| QWEBC VAE 

10000. | 9435 | $456 | 89.25 
20000 | 9325 | 8375 | 88.75 
30000 | 915 8296 | 87.66 
40000 j| 91. | 8222 [ 86.37 

| 50000 | 9024 | 812 | 85.2 

|__ 60000 | 8925 | 8083 [8401 
70000 | 88.85 79.14 83 
s0000 | ss | 730 | 82.75 | 
90000 | 8711 78.11 | $1.38 
100000 | 365 | 75 |] 80 | 


Table 3 reports the performance analysis of the Air pollution forecasting accuracy versus the number 
of Air quality sample data taken in the ranges from 10000 to 100000. For each method, ten varieties 
of performance results were maintained by a different number of inputs. Table 3 shows Air pollution 
forecasting accuracy by various techniques namely BTBSR-QWEBC as well as two existing IMD- 
VAE [1] CLS [2]. From the observed results, the BTBSR-QWEBC performs improved accuracy of 
two methods. Let us consider 10000 Air quality sample data to measure accuracy. BTBSR-QWEBC, 
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94.35% of accuracy was maintained where accuracy of IMD-VAE [1] and CLS [2] are 84.56% and 
89.25% respectively. Likewise, nine different outcomes were observed as well as overall accuracy of 
BTBSR-QWEBC was compared with conventional techniques. Ten comparison outcomes represent 
Air pollution forecasting accuracy of BTBSR-QWEBC was improved as 11% and 6% of existing 
techniques. 


w BTBSR-QWEBC 
u IMD-VAE 
“CLS 


Air pollution forecasting accuracy (%) 


Air quality sample data (numbers) 


Figure 6 Air pollution forecasting accuracy versus Air quality sample data 


Figure 6 portrays the comparison of Air pollution forecasting accuracy using three different 

methods namely BTBSR-QWEBC and two existing IMD-VAE [1] CLS [2]. The graphical chart 
indicates that the Air pollution forecasting accuracy is observed at the ‘y’ axis and air quality sample 
data are given to the ‘x-axis. BTBSR-QWEBC enhances accuracy through utilizing Quadratic 
Weighted emphasis boost ensemble classification technique. Kernelized support vector was 
employed for analysing the testing and training data. Ensemble method combines weak learners as 
well as provides strong classification results. 
The proposed ensemble technique set weak learners as a Kernelized support vector classifier with the 
input samples. The weak learner initializes the number of classes and computes the air quality index. 
Then Laplace RBF kernel is used to compute the relationship between the raining and testing ‘AQI’ 
of a particular class. Weak learner results are summed and assigned the weight. The emphasis was 
applied for measuring quadratic error. Weak learner using least quadratic error was selected by last 
strong classification result. In this way, accurate air quality forecasting is performed with minimum 
time. 


Conclusion 

Air quality is observed by challenging issue of early air pollution caution. Identifying different 
pollutant factorsthat contribute to air pollution plays a fundamental function for achieving efficient 
scheme for decrease air pollution. BTBSR-QWEBC was effective as well as successful with higher 
accuracy and minimum time consumption. By designing a Bilateral discretized Z- wavelet transform, 
the noise data from the dataset is removed. Followed by, Otsuka indexive Broken-stick regression- 
based feature selection is performed to find the relevant features. Hence it minimizes the time 
utilization and memory consumption for Air pollution forecasting. Next, the Quadratic Weighted 
emphasis boost ensemble classification is applied to predict future outcomes by constructing the 
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number of weak learners. The weak learners are combined to make a strong by finding the minimum 
error. As a result, accurate classification is performed resulting in improves forecasting performance 
results. The experimentation results of the BTBSR-QWEBC technique and existing classification 
techniques are estimated with different metrics such as Air pollution forecasting accuracy, error rate, 
Air pollution forecasting time, and Memory consumption. The experimental results show that the 
BTBSR-QWEBC technique achieves higher forecasting accuracy and minimum time, Memory 
consumption as well as error rate. 
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